-
Notifications
You must be signed in to change notification settings - Fork 349
Math: FFT: Move twiddle factors data to DRAM #10457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This patch moves the twiddle factors tables to sof/src/math/fft/coef directory. The purpose is to make the FFT library more modular. Also there is no usage for the coefficients data beyond local usage by the FFT functions. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch sets the twiddle factors data as __cold_rodata when MATH_FFT_COLD_TWIDDLE_FACTORS is set to y. The twiddle factors for the maximum FFT size are linked to DRAM and the needed coefficients are copied to SRAM when the FFT plan is initialized. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
| mod_free(mod, plan->tmp_i32[0]); | ||
|
|
||
| mod_free(mod, plan->bit_reverse_idx); | ||
| mod_free(mod, plan->multi_twiddle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shared_twiddle?
lgirdwood
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one open around alignment given recent SIMD vector alignment around pointer assignments.
| struct icomplex32 *outb32; /* pointer to output integer complex buffer */ | ||
| struct icomplex16 *inb16; /* pointer to input integer complex buffer */ | ||
| struct icomplex16 *outb16; /* pointer to output integer complex buffer */ | ||
| void *twiddle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use __builtin_assume_aligned(ptr, SIMD_VECTOR_SIZE) when we assign these ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood you mean in fft_common.c at line 139 and in fft_multi.c at line 145 below? Hm, I'm reading, that __builtin_assume_aligned() is a "hint to the compiler" that the address is aligned. What is it needed here for? To avoid the compiler doing additional alignment for SIMD / HiFi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 2 main reasons:
- hint - dont generate any code that deals with unaligned header data. Autovectorizers also greatly benefit for these hints. i.e. vectorize if you can and it makes sense
- hint - generate warnings if unaligned usage is detected, although I'm not sure if gcc/clang can fully do this today with current options (it may throw a "cant vectorize due to X type warning though). I'm hoping we can get some benefit here as I want to avoid the silent hang we had recently with the alignment mismatch when a pointer was assigned and cast incorrectly.
@singalsu if you also iterate our loops by vector size it will also mean CC should not generate any trailing bytes handling code too. Especially important for non intrinsic C code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood @singalsu so we want, e.g. in line 139 in fft_common.c
plan->twiddle = __builtin_assume_aligned(fft_plan_allocate_twiddle(mod, size, bits), 8);
? I might be wrong but I think that these hints have local effect? E.g. where you have code like
static int do_calc(uint8_t *arg)
{
return INTRINSIC_OP_TAKING_8_BYTE_ARG(arg);
}
you could help the compiler by doing
static int do_calc(uint8_t *arg)
{
uint8_t *data = __builtin_assume_aligned(arg, 8);
return INTRINSIC_OP_TAKING_8_BYTE_ARG(data);
}
Then the compiler will know that data is aligned, so it won't generate unaligned pointer handling for the intrinsic op? So I think it should be used close to where those intrinsic operations are used? See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html and https://gcc.gnu.org/projects/tree-ssa/vectorization.html Also I don't see warnings being issued anywhere. Seems like that would be difficult when used as in those examples on linked sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh thats fine, as long as we can give the compiler the hint (and yes, this may be needed in more than one place) - the vectorizer will use this hint too and has better warnings about when it cant vectorize.
No description provided.